10 research outputs found

    Une approche flexible et décentralisée du traitement de requêtes dans les systèmes géo-distribués

    Get PDF
    This thesis studies the design of query processing systems, across a diversity of geo-distributed settings. Optimising performance metrics such as response time, freshness, or operational cost involves design decisions, such as what derived state (e.g., indexes, materialised views, or caches) to maintain, and how to distribute and where to place the corresponding computation and state. These metrics are often in tension, and the trade-offs depend on the specific application and/or environment. This requires the ability to adapt the query engine's topology and architecture, and the placement of its components. This thesis makes the following contributions: - A flexible architecture for geo-distributed query engines, based on components connected in a bidirectional acyclic graph. - A common microservice abstraction and API for these components, the Query Processing Unit (QPU). A QPU encapsulates some primitive query processing task. Multiple QPU types exist, which can be instantiated and composed into complex graphs. - A model for constructing modular query engine architectures as a distributed topology of QPUs, enabling flexible design and trade-offs between performance metrics. - Proteus, a QPU-based framework for constructing and deploying query engines. - Representative deployments of Proteus and experimental evaluation thereof.Cette thèse présente l'étude de la conception de systèmes de traitement de requêtes dans divers cadres géo-distribués. L'optimisation des mesures de performance telles que le temps de réponse, la fraîcheur ou le coût opérationnel implique des décisions de conception tel que le choix de l’état dérivé (indices, vues matérialisées, caches par ex.) à construire et maintenir, et la distribution et le placement de ces derniers et de leurs calculs. Ces métriques sont souvent opposées et les compromis dépendent de l'application et/ou de la spécificité de l'environnement. La capacité d'adapter la topologie et l'architecture du système de traitement de requêtes devient alors essentielle, ainsi que le placement de ses composants. Cette thèse apporte les contributions suivantes : - Une architecture flexible pour les systèmes de traitement de requêtes géo-distribués, basée sur des composants connectés dans un graphe bidirectionnel acyclique. - Une abstraction de micro-service et une API communes pour ces composants, le Query Processing Unit (QPU). Un QPU encapsule une tâche de traitement de requête primitive. Il existe plusieurs types de QPU qui peuvent être instanciés et composés en graphes complexes. - Un modèle pour construire des architectures de systèmes de traitement de requêtes modulaires composées d’une topologie distribuée de QPUs, permettant une conception flexible et des compromis selon les mesures de performance visées. - Proteus, un framework basé sur les QPU, permettant la construction et le déploiement de systèmes de traitement de requêtes. - Déploiements représentatifs de systèmes de traitement de requêtes à l'aide de Proteus, et leur évaluation expérimentale

    Towards application-specific query processing systems

    Get PDF
    Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise from the geo-distribution of users and data. In this paper, we present a vision towards a new type of query system architecture that addresses this challenge by enabling query systems to be designed and deployed in a per use case basis. We propose a distributed abstraction called Query Processing Unit that encapsulates primitive query processing tasks, and show how it can be used as a building block for assembling query systems. Using this approach, application architects can construct query systems specialized to their use cases, by controlling the query system's architecture and the placement of its state. We demonstrate the expressiveness of this approach by applying it to the design of a query system that can flexibly place its state in the data center or at the edge, and show that state placement decisions affect the trade-off between query response time and query result freshness

    A Modular Design for Geo-Distributed Querying: Work in Progress Report

    Get PDF
    International audienceMost distributed storage systems provide limited abilities for querying data by attributes other than their primary keys. Supporting efficient search on secondary attributes is challenging as applications pose varying requirements to query processing systems, and no single system design can be suitable for all needs. In this paper, we show how to overcome these challenges in order to extend distributed data stores to support queries on secondary attributes. We propose a modular architecture that is flexible and allows query processing systems to make trade-offs according to different use case requirements. We describe adap-tive mechanisms that make use of this flexibility to enable query processing systems to dynamically adjust to query and write operation workloads

    Towards application-specific query processing systems

    Get PDF
    International audienceDatabase systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise from the geo-distribution of users and data. In this paper, we present a vision towards a new type of query system architecture that addresses this challenge by enabling query systems to be designed and deployed in a per use case basis. We propose a distributed abstraction called Query Processing Unit that encapsulates primitive query processing tasks, and show how it can be used as a building block for assembling query systems. Using this approach, application architects can construct query systems specialized to their use cases, by controlling the query system's architecture and the placement of its state. We demonstrate the expressiveness of this approach by applying it to the design of a query system that can flexibly place its state in the data center or at the edge, and show that state placement decisions affect the trade-off between query response time and query result freshness

    CRDTs for truly concurrent file systems

    Get PDF
    International audienceBuilding scalable and highly available geo-replicated file systems is hard. These systems need to resolve conflicts that emerge in concurrent operations in a way that maintains file system invariants, is meaningful to the user, and does not depart from the traditional file system interface. Conflict resolution in existing systems often leads to unexpected or inconsistent results. This paper introduces ElmerFS, a geo-replicated, truly concurrent file system designed with the aim of addressing these challenges. ElmerFS is based on two key ideas: (1) the use of Conflict-Free Replicated Data Types (CRDTs) for representing file system structures, which ensures that replicas converge to a correct state, and (2) conflict resolution rules, which are determined by the choice of CRDT types and their composition, designed with the principle of being intuitive to the user. We argue that if the state of the file system after resolving a conflict conveys to the user the resolved conflict in an intuitive way, the user can complement or reverse it using traditional file system operations. We discuss the challenges in the design of geo-replicated weakly consistent file systems, and present the design of ElmerFS

    Une approche flexible et décentralisée du traitement de requêtes dans les systèmes géo-distribués

    Get PDF
    This thesis studies the design of query processing systems, across a diversity of geo-distributed settings. Optimising performance metrics such as response time, freshness, or operational cost involves design decisions, such as what derived state (e.g., indexes, materialised views, or caches) to maintain, and how to distribute and where to place the corresponding computation and state. These metrics are often in tension, and the trade-offs depend on the specific application and/or environment. This requires the ability to adapt the query engine's topology and architecture, and the placement of its components. This thesis makes the following contributions: - A flexible architecture for geo-distributed query engines, based on components connected in a bidirectional acyclic graph. - A common microservice abstraction and API for these components, the Query Processing Unit (QPU). A QPU encapsulates some primitive query processing task. Multiple QPU types exist, which can be instantiated and composed into complex graphs. - A model for constructing modular query engine architectures as a distributed topology of QPUs, enabling flexible design and trade-offs between performance metrics. - Proteus, a QPU-based framework for constructing and deploying query engines. - Representative deployments of Proteus and experimental evaluation thereof.Cette thèse présente l'étude de la conception de systèmes de traitement de requêtes dans divers cadres géo-distribués. L'optimisation des mesures de performance telles que le temps de réponse, la fraîcheur ou le coût opérationnel implique des décisions de conception tel que le choix de l’état dérivé (indices, vues matérialisées, caches par ex.) à construire et maintenir, et la distribution et le placement de ces derniers et de leurs calculs. Ces métriques sont souvent opposées et les compromis dépendent de l'application et/ou de la spécificité de l'environnement. La capacité d'adapter la topologie et l'architecture du système de traitement de requêtes devient alors essentielle, ainsi que le placement de ses composants. Cette thèse apporte les contributions suivantes : - Une architecture flexible pour les systèmes de traitement de requêtes géo-distribués, basée sur des composants connectés dans un graphe bidirectionnel acyclique. - Une abstraction de micro-service et une API communes pour ces composants, le Query Processing Unit (QPU). Un QPU encapsule une tâche de traitement de requête primitive. Il existe plusieurs types de QPU qui peuvent être instanciés et composés en graphes complexes. - Un modèle pour construire des architectures de systèmes de traitement de requêtes modulaires composées d’une topologie distribuée de QPUs, permettant une conception flexible et des compromis selon les mesures de performance visées. - Proteus, un framework basé sur les QPU, permettant la construction et le déploiement de systèmes de traitement de requêtes. - Déploiements représentatifs de systèmes de traitement de requêtes à l'aide de Proteus, et leur évaluation expérimentale

    CRDTs for truly concurrent file systems

    Get PDF
    International audienceBuilding scalable and highly available geo-replicated file systems is hard. These systems need to resolve conflicts that emerge in concurrent operations in a way that maintains file system invariants, is meaningful to the user, and does not depart from the traditional file system interface. Conflict resolution in existing systems often leads to unexpected or inconsistent results. This paper introduces ElmerFS, a geo-replicated, truly concurrent file system designed with the aim of addressing these challenges. ElmerFS is based on two key ideas: (1) the use of Conflict-Free Replicated Data Types (CRDTs) for representing file system structures, which ensures that replicas converge to a correct state, and (2) conflict resolution rules, which are determined by the choice of CRDT types and their composition, designed with the principle of being intuitive to the user. We argue that if the state of the file system after resolving a conflict conveys to the user the resolved conflict in an intuitive way, the user can complement or reverse it using traditional file system operations. We discuss the challenges in the design of geo-replicated weakly consistent file systems, and present the design of ElmerFS

    EXA2PRO : A Framework for High Development Productivity on Heterogeneous Computing Systems

    Get PDF
    Programming upcoming exascale computing systems is expected to be a major challenge. New programming models are required to improve programmability, by hiding the complexity of these systems from application developers. The EXA2PRO programming framework aims at improving developers productivity for applications that target heterogeneous computing systems. It is based on advanced programming models and abstractions that encapsulate low-level platform-specific optimizations and it is supported by a runtime that handles application deployment on heterogeneous nodes. It supports a wide variety of platforms and accelerators (CPU, GPU, FPGA-based Data-Flow Engines), allowing developers to efficiently exploit heterogeneous computing systems, thus enabling more HPC applications to reach exascale computing. The EXA2PRO framework was evaluated using four HPC applications from different domains. By applying the EXA2PRO framework, the applications were automatically deployed and evaluated on a variety of computing architectures, enabling developers to obtain performance results on accelerators, test scalability on MPI clusters and productively investigate the degree by which each application can efficiently use different types of hardware resources.Funding Agencies|European Unions Horizon 2020 research and innovation programme [801015]; National Infrastructures for Research and Technology S.A. (GRNET) [SNIC 2020/13-113, SNIC 2016/5-6]; PRACE (Piz-Daint) [pr114]</p
    corecore